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Abstract 



\ The recently introduced theory of compressive sensing (CS) enables the reconstruction of 

sparse or compressible signals from a small set of nonadaptive, linear measurements. If properly 
chosen, the number of measurements can be significantly smaller than the ambient dimension 
^3 ' 01 the signal and yet preserve the significant signal information. Interestingly, it can be shown 

that random measurement schemes provide a near-optimal encoding in terms of the required 
number of measurements. In this report, we explore another relatively unexplored, though often 
alluded to, advantage of using random matrices to acquire CS measurements. Specifically, we 
show that random matrices are democractic, meaning that each measurement carries roughly 
the same amount of signal information. We demonstrate that by slightly increasing the number 
of measurements, the system is robust to the loss of a small number of arbitrary measurements. 
In addition, we draw connections to oversampling and demonstrate stability from the loss of 
f***- ■ significantly more measurements. 

q . 
— < 

1 Introduction 

The recently developed compressive sensing (CS) framework allows us to acquire a signal x 6 



from a small set of M non-adaptive, linear measurements [1,2]. This process can be represented as 

y = $x (1) 

where 3> is an M x N matrix that models the measurement system. The hope is that we can 
design $ so that x can be accurately recovered even when M <C N. In general this is not possible, 
but if x is -fT-sparse, meaning that it has only K nonzero entries then it is possible to design 3> 
that preserve the information about x using only M = 0(Klog(N/K)) measurements. The most 
commonly studied 3> that satisfy this bound on M are random, i.e., each entry of is drawn 
independently from some suitable distribution [3]. We will focus our attention on such <&. 
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Among the advantages of random measurements is a property commonly referred to as democ- 
racy. While it is not usually rigorously defined in the literature, democracy is usually taken to 
mean that each measurement contributes a similar amount of information about the signal x to the 
compressed representation y [4-6] Others have described democracy to mean that each measure- 
ment is equally important (or unimportant) [7]. Despite the fact that democracy is so frequently 
touted as an advantage of random measurements, it has received little analytical attention in the CS 
context. Perhaps more surprisingly, the property has not been explicitly exploited in applications 
until recently [8]. 

The fact that random measurements are democratic seems intuitive; when using random mea- 
surements, each measurement is a randomly weighted sum of a large fraction (or all) of the entries 
of x, and since the weights are chosen independently at random, no preference is given to any 
particular entries. More concretely, suppose that the measurements 2/1,2/2, ■ ■ ■ ,Um are independent 
and identically distributed (i.i.d.) according to some distribution fy, as is the case for the con- 
sidered in this report. Now suppose that we select M < M of the j/i at random (or according to 
some procedure that is independent of y). Then clearly, we are left with a length-M measurement 
vector y such that each j7i ~ /y. Stated another way, if we set D = M — M, then there is no 
difference between collecting M measurements and collecting M measurements and deleting D of 
them, provided that this deletion is done independently of the actual values of y. 

However, following this line of reasoning will ultimately lead to a rather weak definition of 
democracy. To see this, consider the case where the measurements are deleted by an adversary. 
By adaptively deleting the entries of y one can change the distribution of y. For example, the 
adversary can delete the D largest elements of y, thereby skewing the distribution of y. In many 
cases, especially if the same matrix $ will be used repeatedly with different measurements being 
deleted each time, it would be far better to know that any M measurements will be sufficient to 
reconstruct the signal. This is a significantly stronger requirement. 

In order to formally define this stronger notion of democracy, we must first describe the prop- 
erties that a matrix must satisfy to ensure stable reconstruction. Towards that end, we recall the 
definition of the restricted isometry property (RIP) for the matrix 3> [9]. 

Definition 1. A matrix satisfies the RIP of order K with constant 5 £ (0, 1) if 

(l-^)||x||2<||* x |||<(l + d)||x||| (2) 

holds for all x such that ||x||o < K . 

Much is known about matrices that satisfy the RIP, but for our purposes it suffices to note that 
if we draw a random M x N matrix <1? whose entries 4>ij are i.i.d. sub-Gaussian random variables, 
then provided that 

M = (Klog(N/K)) , (3) 

we have that with high probability 3> will satisfy the RIP of order K with constant 5 [3, 10]. 

When it is satisfied, the RIP for a matrix $ provides a sufficient condition to guarantee successful 
sparse recovery using a wide variety of algorithms [9,11-20]. As an example, the RIP of order 
IK (with isometry constant 5 < v2 — 1) is a sufficient condition to permit ^-minimization (the 
canonical convex optimization problem for sparse approximation) to exactly recover any X-sparse 
signal and to approximately recover those that are nearly sparse [11]. The same assumption is also 
a sufficient condition for robust recovery in noise using a modified £i-minimization [11]. 

The original introduction of this term was with respect to quantization [4,5], i.e., a democratic quantizer would 
ensure that each bit is given "equal weight." As the CS framework developed, it became empirically clear that CS 
systems exhibited this property with respect to compression [6]. 
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The RIP also provides us with a way to quantify our notion of democracy. To do so, we first 
establish some notation that will prove useful throughout this report. Let rc{l,2,,..., M}. By 
3> r we mean the |T| x M matrix obtained by selecting the rows of indexed by T. Alternatively, 
if A C {1,2,... ,iV}, then we use 3>a to indicate the M x |A| matrix obtained by selecting the 
columns of $ indexed by A. Following [8], we now formally define democracy as follows. 

Definition 2. Let 3? be and M x N matrix, and let M < M be given. We say that $ is (M, K, 5)- 
democratic if for all T such that \T\ > M the matrix 3> r satisfies the RIP of order K with constant 
5. 

In Section [2] below we present a simple proof that Gaussian matrices are democratic and demon- 
strate how the proof can be extended to sub-Gaussian matrices. The core of this proof can be found 
in [8], but is included in full in this report. In Section [3] we discuss the implications of the result 
and alternative interpretations. Section 0] contains the additional theorems required by the proof. 

2 Random matrices are democratic 

We now demonstrate that certain randomly generated matrices are democratic. While the theorem 
actually holds (with different constants) for the more general class of sub- Gaussian matrices, for 
simplicity we restrict our attention to Gaussian matrices. We provide discussion of the sub-Gaussian 
case in Section [U 

Theorem 1. Let <& by an M x N matrix with elements (frij drawn according to M(0, 1/M) and let 
M < M, K <M, and 5 G (0, 1) be given. Define D = M — M. If 

M = C 1 {K + D)\o g ^- w ^y (4) 

then with probability exceeding 1 — 3e~ C2M we have that <1> is (M,K,S/(1 — 5)) -democratic, where 
d is arbitrary and C 2 = (5/8) 2 - log(42e/5)/C 1 . 

Proof. Our proof consists of two main steps. We begin by defining the M x (iV + M) matrix 
A = [I «&] formed by appending $ to the M x M identity matrix. Theorem [21 also found in [21], 
demonstrates that under the assumptions in the theorem statement, with probability exceeding 
1 — 3e~ c ' 2M we have that A satisfies the RIP of order K + D with constant 5. The second step is 
to use this fact to show that all possible M x N submatrices of $ satisfy the RIP of order K with 
constant 5/(1 — 6). 

Towards this end, we let T C {1, 2, . . . , M} be an arbitrary subset of rows such that |T| > M. 
Define A = {l,2,...,M}\r and note that |A| = D. Additionally, let 

P A 4 AaA^, (5) 

be the orthogonal projector onto TZ(A\), i.e., the range, or column space, of AaH Furthermore, 
we define 

Pi = I - Pa. (6) 
as the orthogonal projector onto the orthogonal complement of 7£(Aa). In words, this projector 
nulls the columns of A corresponding to the index set A. Now, note that A C {1,2,..., M}, so 
Aa = Ia- Thus, 

P A = I A I A = Ia^aIa)" 1 !! = IaI a = 1(A), 

A' A — (A a Aa) A a denotes the Moore-Penrose pseudo-inverse of Aa. 
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where we use 1(A) to denote the M x M matrix with all zeros except for ones on the diagonal 
entries corresponding to the columns indexed by A. (We distinguish the M x M matrix 1(A) from 
the M x D matrix I a — in the former case we replace columns not indexed by A with zero columns, 
while in the latter we remove these columns to form a smaller matrix.) Similarly, we have 



Thus, we observe that the matrix P^A = I(r)A is simply the matrix A with zeros replacing 
all entries on any row i such that i ^ V, i.e., (P^;A) r = A r and (P^A) A = 0. Furthermore, 
Theorem [3l also found in [22], states that for A satisfying the RIP of order K + D with constant 
5, we have that 



holds for all u G R N+M such that ||u|| = K + D - |A| = K and supp(u) n A = 0. Equivalently, 
letting A c = {1,2,. . . ,N + M}\ A, this result states that (I(T)A) A c satisfies the RIP of order K 
with constant 5/(1 — 5). To complete the proof, we note that if (I(r)A) A c satisfies the RIP of order 
K with constant 8/(1 — 5), then we trivially have that I(r)<& also has the RIP of order at least K 
with constant 5/(1 — 5), since I(r)<& is just a submatrix of (I(r)A)A=- Since ||I(r)4>x[|2 = ||3> r x||2, 
this establishes the theorem. □ 

3 Discussion 

3.1 Robustness and stability 

Observe that we require roughly 0(D \og(N)) additional measurements to ensure that <& is (M, K, 5)- 
democratic compared to the number of measurements required to simply ensure that $ satisfies 
the RIP of order K. This seems intuitive; if we wish to be robust to the loss of any D measure- 
ments while retaining the RIP of order K, then we should expect to take at least D additional 
measurements. This is not unique to the CS framework. For instance, by oversampling, i.e., sam- 
pling faster than the minimum required Nyquist rate, uniform sampling systems can also improve 
robustness with respect to the loss of measurements. However, a benefit of the democratic CS sys- 
tem is that the number of additional measurements needed grows more slowly than in the Nyquist 
case. To see this, consider the case where we lose D samples or measurements. For a fixed time 
period, suppose that sampling the signal at the Nyquist rate yields N samples. To be robust to 
the loss of a contiguous block of D samples, we must sample at D + 1 times the Nyquist rate, 
yielding DN additional samples. In contrast, the number of additional measurements needed for 
a CS measurement system to be democratic is 0(D log(AQ), given by (|3J). Thus, the number of 
additional samples required by a Nyqust sampler depends linearly on D and N while the number of 
additional measurements for democratic CS systems is still linear in D but only logarithmic in N. 
If N is large, this can result in tremendous savings. Note also that for a fixed N and K, by driving 
M higher a CS measurement system can be robust to the loss of a large fraction of the acquired 
measurements, whereas in Nyquist oversampling, the fraction of (consecutive) samples that can be 
dropped can never exceed 1/N . 

In some applications, this difference may have significant impact. For example, in finite dynamic 
range quantizers, the measurements saturate when their magnitude exceeds some level. Thus, when 
uniformly sampling with a low saturation level, if one sample saturates, then the likelihood that 
any of the neighboring samples will saturate is high, and significant oversampling may be required 



Pi = i-p A = i(r). 




(7) 
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to ensure any benefit. However, in CS, if many adjacent measurements were to saturate, then for 
only a slight increase in the number of measurements we can mitigate this kind of error by simply 
rejecting the saturated measurements; the fact that <& is democratic ensures that this strategy will 
be effective. 

In addition to robustness, Theorem Q] implies that reconstruction from a subset of CS measure- 
ments is stable to the loss of a potentially larger number of measurements than anticipated. To see 
this, suppose that and M x N matrix 3> is (M — D, K, <5)-democratic, but consider the situation 
where D + D measurements are dropped. It is clear from the proof of Theorem Q] that if D < K, 
then the resulting matrix <l? r will satisfy the RIP of order K — D with constant 5. Thus, from [23], 
if we define K = (K — D)/2, then the reconstruction error is then bounded by 



where x^ denotes the best fT-term approximation of x and C3 is an absolute constant depending on 
<1> that can be bounded using the constants derived in Theorem [TJ Thus, if D is small then the ad- 
ditional error caused by dropping too many measurements will also be relatively small. In contrast, 
there is simply no analog to this kind of stability result for uniform sampling with linear recon- 
struction. When the number of dropped samples exceeds D (where D represents the oversampling 
factor described above), there is are no guarantees as to the accuracy of the reconstruction. 

3.2 Numerical exploration 

As discussed previously, the democracy property is a stronger condition than the RIP. To demon- 
strate this, we perform a numerical simulation which illustrates this point. Specifically, we would 
like to compare the case where the measurements are dropped at random versus the case where 
the dropped measurements are selected by an adversary. Ideally, we would like to know whether 
the resulting matrices satisfy the RIP. Of course, this experiment is impossible to perform for two 
reasons: first, determining if a matrix satisfies the RIP is computationally intractable as it would 
require checking all possible /C-dimensional sub-matrices of <& r . Moreover, in the adversarial set- 
ting one would also have to search for the worst possible V as well, which is impossible for the same 
reason. Thus, we instead perform a far simpler experiment, which serves as a very rough proxy to 
the experiment we would like to perform. 

The experiment proceeds over 100 trials as follows. We fix the parameters N = 2048 and K = 13 
and vary M in the range (0, 380). In each trial we draw a new matrix $ with faj ~ JV(0, 1/-M") and 
a new signal with K nonzero coefficients, also drawn from a Gaussian distribution, and then the 
signal is normalized | |x 1 1 2 = 1- Over each set of trials we estimate two quantities: 

1. the maximum D such that we achieve exact reconstruction for a randomly selected (M — 
D) x N submatrix of <3? on each of the 100 trials; 

2. the maximum D such that we achieve exact reconstruction for R = 300 randomly selected 
(M — D) x N submatrices of $ on each of the 100 trials.. 

Ideally, the second case should consider all (M — D) x N submatrices of $ rather than just 300 
submatrices, but as this is not possible (for reasons discussed above) we simply perform a random 
sampling of the space of possible submatrices. Note also that exact recovery on one signal is also 
not proof that the matrix satisfies the RIP, although failure is proof that the matrix does not. 



x — 



x - x|| 2 < C 3 




(8) 
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Figure 1: Maximum number of measurements that can be dropped -D max vs. number of measure- 
ments M for (a) exact recovery of one (M — D) x N submatrix of and (b) exact recovery of 
R = 300 (M - D) x N submatrices of 



The results of this experiment are depicted in Figure [TJ The circles denote data points with 
the empty circles corresponding to the random selection experiment and the solid circles corre- 
sponding to the democracy experiment. The lines denote the best linear fit for each data set where 
D > 0, with the dashed line corresponding to the random selection experiment and the solid line 
corresponding to democracy experiment. 

The maximum D corresponding to the random selection experiment grows linearly in M (with 
coefficient 1) once the minimum number of measurements required for RIP, denoted by M 1 , is 
reached. This is because beyond this point at most D = M—M' measurements can be discarded. As 
demonstrated by the plot, M' ~ 90 for this experiment. For the democracy experiment M' ~ 150, 
larger than for the RIP experiment. Furthermore, the maximum D for democracy grows more 
slowly than for the random selection case, which indicates that to be robust to the loss of any D 
measurements, CD additional measurements, with C > 1, are actually necessary. 



4 Theorems 

In this section, we prove the two supporting Theorems used in the proof of Theorem [TJ We begin 
by demonstrating that the matrix A = [I <&] satisfies the RIP. To do so, we first establish the 
following lemma, that closely parallels the result in equation (4.3) of [3]. The lemma demonstrates 
that for any u, if we draw $ at random, then ||Au||2 is concentrated around 1 1 u 1 1 2 - 

Lemma 1. Let 3> by an M x N matrix with elements 4>ij drawn i.i.d. according to M(0, 1/Af) and 
let A = [I 4?] . Furthermore, let u £ R^" 1 "^ be an arbitrary vector with first M entries denoted by 
w and last N entries denoted by x. Let rj E (0, 1) be given. Then 

E(||Au||I) = ||u||| (9) 

and 

P(|||Au||!-||u||!| >2 V \\u\\ 2 2 ) <3e- M " 2 / 8 . (10) 
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Proof. We first note that since Au = w + <l?x, we have that 

||Au|| 2 = ||w + *x||| 

= (w + *x) T (w + *x) 

= w T w + 2w T *x + x T * T *x 

= || w||| + 2w T *x+ ||*x||2. (11) 

Since the entries faj arei.i.d. according to A/"(0, 1/M), it is straightforward to show that E (||^x:|||) = 
||x|| 2 (see, for example, [24]). Similarly, one can also show that 2w r $x ~ J\f (0, 4||w||2||x|||/M) , 
since the elements of <&x are distributed as zero mean Gaussian variables with variance ||x|||/-W. 
Thus, from (jlip we have that 

E(||Au|||) = ||w||| + ||x|||, 

and since ||u||| = ||w||| + ||x|||+, this establishes (|9|). 

We now turn to (jlOp . Using the arguments in [24], one can show that 

P(|||*x|H- ||x|||| >ry||x||i) < 2e~ Mr > 2 / 8 . (12) 

As noted above, 2w T $x ~ M (0, 4|| w||| ||x|| 2 /M) . Hence, we have that 

P(|2w^x| >,[|w|| 2 ||x[| 2 ) = 2Q ( 

V2||w||2||x|| 2 /vM/ 

= 2Q(VMri/2), 

where Q(-) denotes the tail integral of the standard Gaussian distribution. From (13.48) of [25] we 
have that 

Q(z) < \e-* 2 l* 

and thus we obtain 

P (|2w T *x| > 77||w||2||x|| 2 ) < e~ Mv2/8 . (13) 

Thus, combining (|12p and (|13p we obtain that with probability at least 1 — Ze~ Mr]2 1 8 we have that 
both 

(1-»7)[|x[|1<[|*x[|1<(1 + t/)||x||1 (14) 

and 

~~ *7|| "w - !! 2 1| x || 2 < 2w T $x < r/||w||2||x||2- (15) 
Using (fTTj) . we can combine (fT4"j) and ([15]) to obtain 

||Au|| 2 < || w||| + r/||w|| 2 ||x||2 + (1 + ??)||x||l 

< (1 +7?) ( || w||| + ||x|||) +J7||w|| 2 ||x||2 

< (1 + »7)l|u||l + »/||u||l 
= (l + 2r?)||u||l, 

where the last inequality follows from the fact that ||w|| 2 ||x||2 < 1 1 u. 1 1 2 1 1 u 1 1 2 - Similarly, we also have 
that 

||Au||l > (1 - 277)||u|||, 

which establishes (fTUI). □ 
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We note that while the above proof assumes that the entries of are Gaussian, this proof holds 
with essentially no modifications for a wide class of sub-Gaussian distributions. A random variable 
X is sub-Gaussian if there exists a constant C > such that 

E (e xt ) < e° 2t2 / 2 (16) 

for all ( £ 1. This says that the moment-generating function of our distribution is dominated by 
that of a Gaussian distribution, which is also equivalent to requiring that the tails of our distribution 
decay at least as fast as the tails of a Gaussian distribution. Examples of sub-Gaussian distributions 
include the Gaussian distribution, the Rademacher distribution, and the uniform distribution. In 
general, any distribution with bounded support is sub-Gaussian. See [26] for more discussion on 
sub- Gaussian random variables. It can be shown (see Lemma 6.1 of [10] or [27]) that if the entries 
of $ are drawn according to a sub-Gaussian distribution, then (|12p holds where 8 is replaced with 
a constant that depends on the constant C in (|16() . Similarly, the cross-term |2w T $x| is also a 
sub-Gaussian random variable, and so using elementary results in [26], a bound analogous to (| 13j) 
can be obtained. 

Using Lemma [H we now demonstrate that the matrix A satisfies the RIP provided that M is 
sufficiently large. 

Theorem 2. Let be an M x N matrix with elements fyj drawn according to M(0, 1/M) and let 
let A = [I *]. // 

M = C 1 (K + D)log(—±-^) (17) 

then with probability exceeding 1 — 3e~ CiM we have that A satisfies the RIP of order (K + D) with 
constant 5, where C\ is arbitrary and C2 = {o~/8) 2 — log(42e/<5)/Ci. 

Proof. First note that it is enough to prove (|17p in the case ||x||2 = 1, since A is linear. Next, fix 
an index set J C {1, 2, . . . , N + M} with \ J\ = K + D, and let Xj denote the (K + D)-dimensional 
subspace spanned by the columns of A indexed by J. We choose a finite set of points Sj such that 
Sj C Xj, ||s||2 < 1 for all s G Sj, and for all x G Xj with ||x||2 < 1 we have 

min llx — s 1 1 o < e- (18) 

seSj" 11 ■- 

One can show (see Chapter 15 of [28]) that such a set Sj exists with \Sj\ < (3/e) K+D . We then 
repeat this process for each possible index set J, and collect all the sets Sj together 

S= |J Sj. (19) 

J:\J\=K+D 

There are g+JJ) < (ef^j)^ +D possible index sets J, and hence \S\ < (ti^)^^ We now 
use the union bound to apply Lemma [T] to this set of points such that, with probability exceeding 

/ 3e N + M \ K+D Mr, 2 My 2 I (K I rAln-Pe N+M \ 

1-3 — e — 5- = l-3e f+^+^^Itxtit J, (20) 



e K + D 

(1 - 2fj)||s||jj < ||As||^ < (1 + 2r?)||s||^, for all s G S. 



we have 

1 2 ^ II A „ II 2 ^ n 1 n„MI„ll2 
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We now define T,k+d = {x : || x llo < K + D}. We define £> as the smallest number such that 
||Ax||l < B\\x\\l, for all x G T, K+D , ||x|| 2 < 1. (21) 



Our goal is to show that B < \f\ + 5. For this, we recall that for any x G X_r;+_d with ||x||2 < 1, 
we can pick a s £ S such that ||x — s 1 1 2 < e an d such that x — s G (since if x G X/, we can 

pick sgS/C Xj satisfying ||x — s|| 2 < e). In this case we have 

||Ax|| 2 < ||As|| 2 + ||A(x - s) || 2 < y/l + 2r] + v^Be. 

Since by definition B is the smallest number for which (|2ip holds, we obtain \f~B < yT + 2r} + yBe, 
which upon rearranging yields y/~B < y/l + 2r//(l — e). One can show that by setting e = 5/14 and 
77 = S/2y2, we have that ^/l + 2r//(l — e) < yl + 6, which establishes the upper inequality in ([2]). 
The lower inequality follows from this since 



|| Ax|| 2 > ||As|| 2 - || A(x - s)|| 2 > y/l-2ri - Vl + > y/l-5, 

where the last inequality again holds with e = 5/14 and r] = 5/2\J2. This establishes the theorem. 
To arrive at the formula for C 2 we first bound the result in (I20j) using 



, .'3eiV + M\ /3e\ , /iV + M 

lo S t? I r> < lo S — lOi 



e K + D J ~ \ e J °\K + D 

and then we replace (-RT + D) log((AT + M)/{K + Z))) with M/C\. After simplification, this yields 
C2 = f] 2 : /8 — log(3e/e)/Ci. By substituting the values for e and r/, we obtain the desired result. □ 

In Theorem [3] below, we show that the matrix P^A satisfies a modified version of the RIP. We 
begin with an elementary lemma that is a straightforward generalization of Lemma 2.1 of [11], and 
states that RIP operators approximately preserve inner products between sparse vectors. 

Lemma 2. Let u, v G ~U. N be given, and suppose that a matrix A satisfies the RIP of order 
max(||u + v||o, ||u — v||o) with isometry constant 5. Then 

|(Au,Av) - (u,v)| < <5||u|| 2 ||v|| 2 . (22) 

Proof. We first assume that ||u|| 2 = ||v|| 2 = 1. From the fact that 

[|u± v||| = ||u||| + ||v||| ±2(u,v) = 2±2(u,v) 

and since A satisfies the RIP, we have that 

(l-«5)(2±2(u,v)) < ||Au±Av||l < (1 + S)(2 ± 2(u, v». 

From the parallelogram identity we obtain 

(Au,Av) = - (||Au + Av||l - || Au- Av|||) (23) 
< (l + <u,v))(l+^)-(l-<u,v))(l-^ =(u>v)+ ^ (24) 

Similarly, one can show that ( Au, Av) > (u, v) — 5, and thus | ( Au, Av) — (u, v) | < 5. The result 
follows for u, v with arbitrary norm from the bilinearity of the inner product. □ 
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One consequence of this result is that sparse vectors that are orthogonal in M N remain nearly 
orthogonal after the application of A. From this observation, it was demonstrated independently 
in [22] and [17] that if A has the RIP, then A satisfies a modified version of the RIP. 

Theorem 3. Suppose that A satisfies the RIP of order K with isometry constant 5, and let A C 
{1,2,..., N}. Define P^ as in®. If |A| < K then 

1 - rb) H u ii2 ^ n p i Au ii2 < a + *)ih! (25) 

for all u £ M N such that ||u|| < K — |A| and supp(u) n A = 0. 

Proof. From the definition of P A A in ([5j), we may decompose P^Au as P^Au = Au — PaAu. 
Since Pa is an orthogonal projection, we can write 

||Au||| = ||P A Au||| + ||P^Au|||. (26) 

Our goal is to show that ||Au||2 ~ ||P A Au||2, or equivalently, that ||P A Au||2 is small. Towards 
this end, we note that since P A Au is orthogonal to P^Au, we have 

(P A Au, Au) = (P A Au, P A Au + Pi Au) 

= (P A Au,P A Au) + (P A Au,PiAu) 

= I|PaAu||2. (27) 



Since P A is a projection onto 7£(A A ) there exists a z £ Mr with supp(z) C A such that P A Au = Az. 
Furthermore, by assumption, supp(u) n A = 0. Hence (u, z) = and from the RIP and Lemma [21 

|(P A Au,Au)| |(Az,Au)| < |(Az,Au)| < 5 



||P A Au||2||Au||2 ||Az||2||Au||2 (1 — (5) 1 1 Z 1 1 2 1 1 U 1 1 2 

Combining this with ([27]) . we obtain 

6 

P A Au 2 < f Au 2 . 

1 — 

Since we trivially have that ||P A Au||2 > 0, we can combine this with (|26j) to obtain 



1 - ( — : ) ||Au||| < ||PjtAu||| < ||Au|||. 



Since Hullo < K, we can use the RIP to obtain 



5 ^ 



(l-<J)||u||i<||PiAu|||<(l + 5)||u||i, 



which simplifies to (|25p . □ 
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